426 research outputs found

    Expression cartography of human tissues using self organizing maps

    Get PDF
    Background: The availability of parallel, high-throughput microarray and sequencing experiments poses a challenge how to best arrange and to analyze the obtained heap of multidimensional data in a concerted way. Self organizing maps (SOM), a machine learning method, enables the parallel sample- and gene-centered view on the data combined with strong visualization and second-level analysis capabilities. The paper addresses aspects of the method with practical impact in the context of expression analysis of complex data sets.
Results: The method was applied to generate a SOM characterizing the whole genome expression profiles of 67 healthy human tissues selected from ten tissue categories (adipose, endocrine, homeostasis, digestion, exocrine, epithelium, sexual reproduction, muscle, immune system and nervous tissues). SOM mapping reduces the dimension of expression data from ten thousands of genes to a few thousands of metagenes where each metagene acts as representative of a minicluster of co-regulated single genes. Tissue-specific and common properties shared between groups of tissues emerge as a handful of localized spots in the tissue maps collecting groups of co-regulated and co-expressed metagenes. The functional context of the spots was discovered using overrepresentation analysis with respect to pre-defined gene sets of known functional impact. We found that tissue related spots typically contain enriched populations of gene sets well corresponding to molecular processes in the respective tissues. Analysis techniques normally used at the gene-level such as two-way hierarchical clustering provide a better signal-to-noise ratio and a better representativeness of the method if applied to the metagenes. Metagene-based clustering analyses aggregate the tissues into essentially three clusters containing nervous, immune system and the remaining tissues. 
Conclusions: The global view on the behavior of a few well-defined modules of correlated and differentially expressed genes is more intuitive and more informative than the separate discovery of the expression levels of hundreds or thousands of individual genes. The metagene approach is less sensitive to a priori selection of genes. It can detect a coordinated expression pattern whose components would not pass single-gene significance thresholds and it is able to extract context-dependent patterns of gene expression in complex data sets.
&#xa

    Mining SOM expression portraits: Feature selection and integrating concepts of molecular function

    Get PDF
    Background: 
Self organizing maps (SOM) enable the straightforward portraying of high-dimensional data of large sample collections in terms of sample-specific images. The analysis of their texture provides so-called spot-clusters of co-expressed genes which require subsequent significance filtering and functional interpretation. We address feature selection in terms of the gene ranking problem and the interpretation of the obtained spot-related lists using concepts of molecular function.

Results: 
Different expression scores based either on simple fold change-measures or on regularized Students t-statistics are applied to spot-related gene lists and compared with special emphasis on the error characteristics of microarray expression data. The spot-clusters are analyzed using different methods of gene set enrichment analysis with the focus on overexpression and/or overrepresentation of predefined sets of genes. Metagene-related overrepresentation of selected gene sets was mapped into the SOM images to assign gene function to different regions. Alternatively we estimated set-related overexpression profiles over all samples studied using a gene set enrichment score. It was also applied to the spot-clusters to generate lists of enriched gene sets. We used the tissue body index data set, a collection of expression data of human tissues, as an illustrative example. We found that tissue related spots typically contain enriched populations of gene sets well corresponding to molecular processes in the respective tissues. In addition, we display special sets of housekeeping and of consistently weak and highly expressed genes using SOM data filtering. 

Conclusions:
The presented methods allow the comprehensive downstream analysis of SOM-transformed expression data in terms of cluster-related gene lists and enriched gene sets for functional interpretation. SOM clustering implies the ability to define either new gene sets using selected SOM spots or to verify and/or to amend existing ones

    Analysis of large-scale molecular biological data using self-organizing maps

    Get PDF
    Modern high-throughput technologies such as microarrays, next generation sequencing and mass spectrometry provide huge amounts of data per measurement and challenge traditional analyses. New strategies of data processing, visualization and functional analysis are inevitable. This thesis presents an approach which applies a machine learning technique known as self organizing maps (SOMs). SOMs enable the parallel sample- and feature-centered view of molecular phenotypes combined with strong visualization and second-level analysis capabilities. We developed a comprehensive analysis and visualization pipeline based on SOMs. The unsupervised SOM mapping projects the initially high number of features, such as gene expression profiles, to meta-feature clusters of similar and hence potentially co-regulated single features. This reduction of dimension is attained by the re-weighting of primary information and does not entail a loss of primary information in contrast to simple filtering approaches. The meta-data provided by the SOM algorithm is visualized in terms of intuitive mosaic portraits. Sample-specific and common properties shared between samples emerge as a handful of localized spots in the portraits collecting groups of co-regulated and co-expressed meta-features. This characteristic color patterns reflect the data landscape of each sample and promote immediate identification of (meta-)features of interest. It will be demonstrated that SOM portraits transform large and heterogeneous sets of molecular biological data into an atlas of sample-specific texture maps which can be directly compared in terms of similarities and dissimilarities. Spot-clusters of correlated meta-features can be extracted from the SOM portraits in a subsequent step of aggregation. This spot-clustering effectively enables reduction of the dimensionality of the data in two subsequent steps towards a handful of signature modules in an unsupervised fashion. Furthermore we demonstrate that analysis techniques provide enhanced resolution if applied to the meta-features. The improved discrimination power of meta-features in downstream analyses such as hierarchical clustering, independent component analysis or pairwise correlation analysis is ascribed to essentially two facts: Firstly, the set of meta-features better represents the diversity of patterns and modes inherent in the data and secondly, it also possesses the better signal-to-noise characteristics as a comparable collection of single features. Additionally to the pattern-driven feature selection in the SOM portraits, we apply statistical measures to detect significantly differential features between sample classes. Implementation of scoring measurements supplements the basal SOM algorithm. Further, two variants of functional enrichment analyses are introduced which link sample specific patterns of the meta-feature landscape with biological knowledge and support functional interpretation of the data based on the ‘guilt by association’ principle. Finally, case studies selected from different ‘OMIC’ realms are presented in this thesis. In particular, molecular phenotype data derived from expression microarrays (mRNA, miRNA), sequencing (DNA methylation, histone modification patterns) or mass spectrometry (proteome), and also genotype data (SNP-microarrays) is analyzed. It is shown that the SOM analysis pipeline implies strong application capabilities and covers a broad range of potential purposes ranging from time series and treatment-vs.-control experiments to discrimination of samples according to genotypic, phenotypic or taxonomic classifications

    The problem of the density of sea water

    Get PDF
    The Density of Aqueous Solutions The density of a solution can be determined easily and is of interest not only for its own sake but also in the calculation of other physical quantities. Many investigators have therefore determined the density and concentration of aqueous solutions. Of the older works, those of Kohlrausch, Kohlrausch and Hallwachs, Baxter and Wallace, and Lamb and Lee are outstanding for their precision...

    Expression cartography of human tissues using self organizing maps

    Get PDF
    Background: The availability of parallel, high-throughput microarray and sequencing experiments poses a challenge how to best arrange and to analyze the obtained heap of multidimensional data in a concerted way. Self organizing maps (SOM), a machine learning method, enables the parallel sample- and gene-centered view on the data combined with strong visualization and second-level analysis capabilities. The paper addresses aspects of the method with practical impact in the context of expression analysis of complex data sets.
Results: The method was applied to generate a SOM characterizing the whole genome expression profiles of 67 healthy human tissues selected from ten tissue categories (adipose, endocrine, homeostasis, digestion, exocrine, epithelium, sexual reproduction, muscle, immune system and nervous tissues). SOM mapping reduces the dimension of expression data from ten thousands of genes to a few thousands of metagenes where each metagene acts as representative of a minicluster of co-regulated single genes. Tissue-specific and common properties shared between groups of tissues emerge as a handful of localized spots in the tissue maps collecting groups of co-regulated and co-expressed metagenes. The functional context of the spots was discovered using overrepresentation analysis with respect to pre-defined gene sets of known functional impact. We found that tissue related spots typically contain enriched populations of gene sets well corresponding to molecular processes in the respective tissues. Analysis techniques normally used at the gene-level such as two-way hierarchical clustering provide a better signal-to-noise ratio and a better representativeness of the method if applied to the metagenes. Metagene-based clustering analyses aggregate the tissues into essentially three clusters containing nervous, immune system and the remaining tissues. 
Conclusions: The global view on the behavior of a few well-defined modules of correlated and differentially expressed genes is more intuitive and more informative than the separate discovery of the expression levels of hundreds or thousands of individual genes. The metagene approach is less sensitive to a priori selection of genes. It can detect a coordinated expression pattern whose components would not pass single-gene significance thresholds and it is able to extract context-dependent patterns of gene expression in complex data sets.
&#xa

    Developmental scRNAseq Trajectories in Gene- and Cell-State Space—The Flatworm Example

    Get PDF
    Single-cell RNA sequencing has become a standard technique to characterize tissue development. Hereby, cross-sectional snapshots of the diversity of cell transcriptomes were transformed into (pseudo-) longitudinal trajectories of cell differentiation using computational methods, which are based on similarity measures distinguishing cell phenotypes. Cell development is driven by alterations of transcriptional programs e.g., by differentiation from stem cells into various tissues or by adapting to micro-environmental requirements. We here complement developmental trajectories in cell-state space by trajectories in gene-state space to more clearly address this latter aspect. Such trajectories can be generated using self-organizing maps machine learning. The method transforms multidimensional gene expression patterns into two dimensional data landscapes, which resemble the metaphoric Waddington epigenetic landscape. Trajectories in this landscape visualize transcriptional programs passed by cells along their developmental paths from stem cells to differentiated tissues. In addition, we generated developmental “vector fields” using RNA-velocities to forecast changes of RNA abundance in the expression landscapes. We applied the method to tissue development of planarian as an illustrative example. Gene-state space trajectories complement our data portrayal approach by (pseudo-)temporal information about changing transcriptional programs of the cells. Future applications can be seen in the fields of tissue and cell differentiation, ageing and tumor progression and also, using other data types such as genome, methylome, and also clinical and epidemiological phenotype data

    Melanoma Single-Cell Biology in Experimental and Clinical Settings

    Get PDF
    Cellular heterogeneity is regarded as a major factor for treatment response and resistance in a variety of malignant tumors, including malignant melanoma. More recent developments of single-cell sequencing technology provided deeper insights into this phenomenon. Single-cell data were used to identify prognostic subtypes of melanoma tumors, with a special emphasis on immune cells and fibroblasts in the tumor microenvironment. Moreover, treatment resistance to checkpoint inhibitor therapy has been shown to be associated with a set of differentially expressed immune cell signatures unraveling new targetable intracellular signaling pathways. Characterization of T cell states under checkpoint inhibitor treatment showed that exhausted CD8+ T cell types in melanoma lesions still have a high proliferative index. Other studies identified treatment resistance mechanisms to targeted treatment against the mutated BRAF serine/threonine protein kinase including repression of the melanoma differentiation gene microphthalmia-associated transcription factor (MITF) and induction of AXL receptor tyrosine kinase. Interestingly, treatment resistance mechanisms not only included selection processes of pre-existing subclones but also transition between different states of gene expression. Taken together, single-cell technology has provided deeper insights into melanoma biology and has put forward our understanding of the role of tumor heterogeneity and transcriptional plasticity, which may impact on innovative clinical trial designs and experimental approaches

    Integrated Multi-Omics Maps of Lower-Grade Gliomas

    Get PDF
    Multi-omics high-throughput technologies produce data sets which are not restricted to only one but consist of multiple omics modalities, often as patient-matched tumour specimens. The integrative analysis of these omics modalities is essential to obtain a holistic view on the otherwise fragmented information hidden in this data. We present an intuitive method enabling the combined analysis of multi-omics data based on self-organizing maps machine learning. It “portrays” the expression, methylation and copy number variations (CNV) landscapes of each tumour using the same gene-centred coordinate system. It enables the visual evaluation and direct comparison of the different omics layers on a personalized basis. We applied this combined molecular portrayal to lower grade gliomas, a heterogeneous brain tumour entity. It classifies into a series of molecular subtypes defined by genetic key lesions, which associate with large-scale effects on DNA methylation and gene expression, and in final consequence, drive with cell fate decisions towards oligodendroglioma-, astrocytoma- and glioblastoma-like cancer cell lineages with different prognoses. Consensus modes of concerted changes of expression, methylation and CNV are governed by the degree of co-regulation within and between the omics layers. The method is not restricted to the triple-omics data used here. The similarity landscapes reflect partly independent effects of genetic lesions and DNA methylation with consequences for cancer hallmark characteristics such as proliferation, inflammation and blocked differentiation in a subtype specific fashion. It can be extended to integrate other omics features such as genetic mutation, protein expression data as well as extracting prognostic markers

    The Evolving Faces of the SARS-CoV-2 Genome

    Get PDF
    Surveillance of the evolving SARS-CoV-2 genome combined with epidemiological monitoring and emerging vaccination became paramount tasks to control the pandemic which is rapidly changing in time and space. Genomic surveillance must combine generation and sharing sequence data with appropriate bioinformatics monitoring and analysis methods. We applied molecular portrayal using self-organizing maps machine learning (SOM portrayal) to characterize the diversity of the virus genomes, their mutual relatedness and development since the beginning of the pandemic. The genetic landscape obtained visualizes the relevant mutations in a lineage-specific fashion and provides developmental paths in genetic state space from early lineages towards the variants of concern alpha, beta, gamma and delta. The different genes of the virus have specific footprints in the landscape reflecting their biological impact. SOM portrayal provides a novel option for ‘bioinformatics surveillance’ of the pandemic, with strong odds regarding visualization, intuitive perception and ‘personalization’ of the mutational patterns of the virus genomes

    Transcriptome-Guided Drug Repositioning

    Get PDF
    Drug repositioning can save considerable time and resources and significantly speed up the drug development process. The increasing availability of drug action and disease-associated transcriptome data makes it an attractive source for repositioning studies. Here, we have developed a transcriptome-guided approach for drug/biologics repositioning based on multi-layer self-organizing maps (ml-SOM). It allows for analyzing multiple transcriptome datasets by segmenting them into layers of drug action- and disease-associated transcriptome data. A comparison of expression changes in clusters of functionally related genes across the layers identifies “drug target” spots in disease layers and evaluates the repositioning possibility of a drug. The repositioning potential for two approved biologics drugs (infliximab and brodalumab) confirmed the drugs’ action for approved diseases (ulcerative colitis and Crohn’s disease for infliximab and psoriasis for brodalumab). We showed the potential efficacy of infliximab for the treatment of sarcoidosis, but not chronic obstructive pulmonary disease (COPD). Brodalumab failed to affect dysregulated functional gene clusters in Crohn’s disease (CD) and systemic juvenile idiopathic arthritis (SJIA), clearly indicating that it may not be effective in the treatment of these diseases. In conclusion, ml-SOM offers a novel approach for transcriptome-guided drug repositioning that could be particularly useful for biologics drugs
    corecore